Lexical Categories for Improved Parsing of Web Data
نویسندگان
چکیده
We investigate the use of features expressing lexical generalizations over word forms when parsing web data and experiment with a range of web text samples, taken from the Ontonotes corpus, as well as the web 2.0 data sets described in Foster et al. (2011b). We obtain significant improvements for a standard data-driven dependency parser when incorporating features expressing these lexical categories, and in fact find that we may dispense with word form features altogether and still observe the same levels of improvement.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملSmoothing fine-grained PCFG lexicons
We present an approach for smoothing treebank-PCFG lexicons by interpolating treebank lexical parameter estimates with estimates obtained from unannotated data via the Inside-outside algorithm. The PCFG has complex lexical categories, making relative-frequency estimates from a treebank very sparse. This kind of smoothing for complex lexical categories results in improved parsing performance, wi...
متن کاملImproved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization
We present an improved approach for learning dependency parsers from treebank data. Our technique is based on two ideas for improving large margin training in the context of dependency parsing. First, we incorporate local constraints that enforce the correctness of each individual link, rather than just scoring the global parse tree. Second, to cope with sparse data, we smooth the lexical param...
متن کاملApplying Semantic Parsing to Question Answering Over Linked Data: Addressing the Lexical Gap
Question answering over linked data has emerged in the past years as an important topic of research in order to provide natural language access to a growing body of linked open data on the Web. In this paper we focus on analyzing the lexical gap that arises as a challenge for any such question answering system. The lexical gap refers to the mismatch between the vocabulary used in a user questio...
متن کاملExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کامل